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Abstract 

We consider avoiding squares and overlaps over the natural numbers, using a greedy 
algorithm that chooses the least possible integer at each step; the word generated is 
lexicographically least among all such infinite words. In the case of avoiding squares, the 
word is 01020103 • • • , the familiar ruler function, and is generated by iterating a uniform 
morphism. The case of overlaps is more challenging. We give an explicitly-defined 
morphism ip: N* — > N* that generates the lexicographically least infinite overlap-free 
word by iteration. Furthermore, we show that for all h,k £ N with h < k, the word 
ip k ~ h (h) is the lexicographically least over lap- free word starting with the letter h and 
ending with the letter k, and give some of its symmetry properties. 

1 Introduction 

Avoidability problems play a significant role in combinatorics on words. Typically we are 
given a finite alphabet E, and we want to know if there exist infinite words over E that avoid 
various patterns, such as squares and overlaps. A square is a nonempty word of the form 
xx, such as the French word chercher. An overlap is a word of the form axaxa where a 



* Supported by an NSERC Alexander Graham Bell Canada Graduate Scholarship. 



is a single letter and x is a (possibly empty) word, such as the French word entente. An 
overlap is sometimes called a (2 + )-power, because it is just slightly more than a square. In 
two famous papers, the Norwegian mathematician Axel Thue [TQ1 [TTJ [3] proved that there 
exist infinite binary words containing no overlaps, and infinite words over a 3-letter alphabet 
containing no squares. 

Suppose we try to generate an infinite squarefree word over the alphabet S 3 = {0, 1,2} 
letter by letter, using the familiar backtracking algorithm [6J. At every step, we choose the 
smallest letter possible that maintains the property of not having a square; if no such letter 
exists, we are forced to backtrack to a previous letter and increment it. For example, this 
approach generates the string w = 0102010, at which point no letter in E 3 can be appended 
without getting a square. Thus we are forced to backtrack one letter, replacing the last letter 
of w with 2 to obtain 0102012, and we continue from there. Although this approach will 
eventually generate the lexicographically least squarefree infinite word over E 3 , surprisingly 
little is known about it. For example, we do not even know whether the number of positions 
that one has to backtrack is bounded. 

This suggests dropping the backtracking entirely, by enlarging our alphabet to the set 
of natural numbers N. (For some recent papers on words and morphisms over an infinite 
alphabet, see [21 El IE]-) In this situation, the concept of irreducibility of words and morphisms, 
introduced in Sectional becomes relevant. As we will see in Section[5j the resulting squarefree 
word, 

w 2 = 01020103010201040102010301020105 

is a famous one; it is the so-called "ruler" sequence, where the nth term is the exponent 
of the highest power of 2 dividing n, and it can be generated by iterating an irreducible 
squarefree morphism. 

Instead of avoiding squares over N, we could try to avoid overlaps. Using a greedy 
algorithm without backtracking, we generate the word 

W 2 + = 0010011001002001001100100210010020010011001002001001200100110010020010011001003 • • • , 

with many remarkable properties. Among other things, w 2 + is generated by iterating a 
certain irreducible overlap-free morphism, but in this case, the morphism is much more 
complicated. This is discussed in Sections [B] and CD 

2 Notation 

Our notation is mostly standard, but we collect it here for ease of reference. 

An alphabet S is a set of symbols, called letters. Although alphabets are usually finite 
in the literature on combinatorics on words, in this paper we also consider the alphabet 
N = {0, 1,2,.. .} of natural numbers. 

A word over this alphabet is a (possibly empty) string of letters chosen from S. The 
empty word is denoted e, and the length of a word w is denoted \w\. We write w[n] for the 
nth letter of w (with indexing starting at 1). 
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The set of all finite words over E is denoted by E*, the set of non-empty finite words by 
E + , and the set of one-way right-infinite words by 

The basic operation on words is concatenation. Usually we represent concatenation by 
juxtaposition, so that x concatenated with y is written xy. However, we sometimes write it 
as x ■ y for clarity; for example, (n + 1) • (n + 2) denotes the word of length 2 consisting of 
the letter n + 1 followed by the letter n + 2. 

A word y is a factor of a word w if there exist words x, z such that w = xyz. If x = e, 
then y is a prefix of tu; if z = e, then y is a suffix of iu. If y is a prefix (resp., suffix) of w, 
then we write y~ l w (resp., wy^ 1 ) to denote the word obtained by removing the prefix (resp., 
suffix) y from w. 

Given an ordering on the elements of S, there is an associated lexicographic order on 
S* U S u . We write x < y if x is a prefix of y, or if we can write x = wcx' and y = wdy', 
where w is a common prefix of x and y and c, d are letters with c < d. 

Given a set P of words, called a pattern, we say that u> avoids P (or that u> is P-free) 
if no word of P is a factor of w. Some examples of interesting patterns include the squares 
{xx : x G E + }, the cubes {xxx : x G S + }, and the overlaps {cxcxc :c6E,iG E*}. 

Let E, A be alphabets. A morphism is a function ft: E* — > A* such that ft(a;y) = h(x)h(y) 
for all x,y £ E*. To define a morphism, it suffices to give ft(c) for all letters c G E. 

The basic operation on morphisms is composition. If E, A, V are alphabets and ft: E* — > 
A*, A* — > T* are morphisms, then their composition g o ft: E* — > T* is also a morphism. 
If E = A, so that ft,: E* — > E*, we can iterate it. We write ft™ for the n-fold composition of 
h with itself, and let h° denote the identity map. 

If c G E is a letter and h: E* — > E* is a morphism with ft(c) = err for some word x, then 



If ft™ (re) 7^ e for all n > 0, then there is a unique infinite word of which c, ft(c), ft 2 (c), . . . are 
all prefixes, and we write it as h w (c). 

Given a property of words, we say that a morphism has that property if it preserves the 
property when applied to words. For example, given a pattern P, we say that the morphism 
h is P-free if h(w) is P-free whenever w is. 

Given an alphabet E, we let S: E + — > E + be the left cyclic shift operator, defined by 
S(cx) = xc for all c G E and x G E*, and we let R: E* — > E* be the reversal operator, defined 
by R(c) = c for c G E and R(xy) = R(y)R(x) for x,y £ E*. Note that these operators are 
not morphisms. 



As we noted, given a pattern P, we can ask whether there are infinite words avoiding P. For 
a finite alphabet E, this turns out to be equivalent to the existence of arbitrarily long finite 
words avoiding P, as the following algorithm shows: 

1. Start with the empty word Wq = e. 



h n ( c ) = c-x-h{x)-h 2 (x) 



h n ~\x). 



3 Backtracking and 



no-backtracking algorithms 
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2. For each i = 0,1,2,..., let Cj 6 S be a letter such that there are arbitrarily long words 
avoiding P with WiCi as a prefix, and set lUj+i = WiQ. Note that the existence of such 
a Q is guaranteed by the pigeonhole principle. 

3. Since Wi is a prefix of Wj whenever % < j, we can take w = lim^ooWj. Then w is an 
infinite word over £ avoiding P. 

If we put an ordering on the letters of E and choose q to be minimal at each step, then 
the algorithm actually shows something slightly stronger: if there are arbitrarily long words 
avoiding P, then there is a lexicographically least infinite word a avoiding P. Since it is not 
clear a priori how to choose q in step 2, we also have the following, more explicit algorithm 
which will either show that there are no infinite words avoiding P, or converge to a: 

1. Start with the empty word w — e. Let a and z be the lexicographically smallest and 
largest letters in E, respectively. 

2. Repeat this step as long as possible: if w does not have a suffix in P, append a to 
it. Otherwise, remove all trailing z's from w, and replace the last letter of w by the 
lexicographically next one in E. This will fail if w contains only z's. 

3. If the preceding step ever fails, conclude that there is no infinite word avoiding P. 
Otherwise, w will eventually start with longer and longer prefixes of a. 

Unfortunately, while the algorithm converges to a (if it exists), it can be hard to determine 
if a given letter of w is there to stay, or if it will eventually be replaced. One way around 
this difficulty is to consider patterns where no backtracking actually occurs in the second 
algorithm. In such cases, we get the no-backtracking algorithm: 

1. Start with the empty word Wo = e. 

2. For each i = 0,1,2,..., let q G E be the lexicographically first letter in E such that 
w^i does not have a suffix in P, if it exists, and set w i+ i = lUjCj. 

3. If the preceding step never fails, then the Wi are the prefixes of a. 

This is what we consider in this paper. For the patterns of squares and overlaps over N, 
the no-backtracking algorithm works, and we construct the resulting words. The squarefree 
word is well-known, but the overlap- free word is not, and we explore its structure. 

4 Irreducibility of words and morphisms 

In the context of the no-backtracking algorithm, the concept of irreducibility becomes rel- 
evant. Given a pattern P over an ordered alphabet E, we say that a word w is irreducible 
at position p (with respect to P) if replacing w[p] with any lexicographically smaller letter 
in E creates a new word with a factor in P ending at position p. (Note that we allow the 
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possibility that w itself already has a factor in P ending at that position.) In particular, if 
w[p] is the smallest letter of E, then w is automatically irreducible at p. 

If a word is irreducible at every position, we simply say that it is irreducible. Sometimes 
we will speak of words w that are irreducible after the first position, meaning that w is 
irreducible at positions 2, 3, . . . , \w\. 

These concepts are related to the lexicographic ordering over £ in the following way. If 
v is irreducible with respect to P and w is P-free, then either w is a prefix of v, or v < w 
lexicographically. This can be seen by considering a longest common prefix x of v and w. 
Either this is all of w, or all of v, or each word contains a letter following x, in which case the 
next letter of v must be strictly smaller than the next letter of w. It follows from this that 
if an infinite word w is P-free and irreducible, then it is the lexicographically least infinite 
P-free word, and the finite P-free and irreducible words are exactly the prefixes of w. 

5 A squarefree word without backtracking 

As a warmup, let us consider the case of squarefree words. For the rest of this section, we 
consider the pattern P = {xx : x G N + } of squares. Any finite squarefree word w over N can 
be extended to a longer squarefree word by appending a letter that does not appear in w, so 
it follows that the no-backtracking algorithm will work and generate the lexicographically 
least infinite squarefree word over N, 

w 2 = 01020103010201040102010301020105- ■ ■ . 

This is the well-known ruler sequence, which is sequence A007814 in Sloane's Encyclopedia 
[0]. For other mentions of the ruler sequence, see [TJ Example 8, p. 187] and [I]. 

Theorem 1. Let'-/ : N* —>■ N* be the morphism defined by~f(i) = O-(i-l-l). Thenw2 = 7^(0). 

Proof. We prove the result by showing that the morphism 7 is squarefree and irreducible. 

Consider the morphism p defined by p(0) = e and p(i) = i — 1 for i > 1. Then it is easy 
to see that p is a left inverse of 7, in the sense that p{ r ){w)) = w for all words w. Suppose 
j(w) contains a square xx. Then x contains at least one nonzero letter, so w = p(7(w)) 
contains the nonempty square p(x)p(x). Hence 7 is a squarefree morphism. 

Now consider the letter d at position p in j(w), and suppose we replace it by a letter 
c < d. If p is odd, then d = 0, so this cannot be done and j(w) is irreducible at this position. 
If p is even, then j(w) [p — 1] is 0, so taking c = creates the square 00 ending at position 
p. On the other hand, taking c > creates a word of the form ^(w'), where w' is obtained 
from w by replacing the letter d — 1 at position p/2 by the smaller letter c — 1. The word 
7(w') has a square ending at position p if and only if w' has a square ending at position p/2. 
Thus, if w is irreducible, then 7(w) is irreducible, so 7 is an irreducible morphism. 

It now follows from the discussion in Section H] that 7^(0) is the lexicographically least 
infinite squarefree word over N. □ 

The following summarizes some folklore results about the ruler sequence. 
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Corollary 2. Let w 2 = 01020103 • • • , and let 7 be the morphism defined above. Then 

(a) |yO")l = 2* fori>0; 

(b) starts with and ends with i + j for i > 1; 

(c) w[i] = v 2 {i), the exponent of the highest power of 2 dividing i; 

(d) The least index i such that w[i] = j is i = 2? ; 

(e) The letter j occurs in w 2 with limiting frequency 2~ % ~ l . 

Proof. Left to the reader. □ 

6 An overlap-free word without backtracking 

For the rest of this paper, we consider the pattern P = {cxcxc : c £N,x e N*} of overlaps. 
As with squares, any finite overlap-free word over N can be extended to a longer overlap-free 
word by appending a letter that does not appear in it, so the lexicographically least infinite 
overlap-free word w 2 + over N exists and can be generated by using the no-backtracking 
algorithm. 

We will show that w 2 + can be written as ^(0) for a certain remarkable morphism 
tp : N* -> N* with 

tp(0) = 001 

<p(i) = 1001002 

(p(2) = 200100110010020010011001003 



To do this, we will first define ip, and then show that it is both overlap-free and irreducible. 
One particularly useful definition of tp: N* — > N* is 

<p(h) = (S- 1 (<p h (00)))-(h + i), heN, 

but to make sure this definition is not circular and prove properties of ip, we need to be 
more careful. We will define a sequence of morphisms iph : {0, . . . , h}* — > {0, . . . , h + 1}* that 
extend each other, and let <p be their limit. 

Definition 3. For all h e N, let ip h : {0, . . . , h}* -»• {0, . . . , h + 1}* be defined by ip h (h') = 
(Ph'(h') for h' < h and by 

<p h (h) = (S- 1 O <p h _ x O ■ ■ ■ O V9 (00)) • (h + 1). 

Note that for h — 0, this definition gives <po(0) = S' _1 (00) ■ 1 = 001. Since iph extends ipw for 
h! < h, it is meaningful to define tp: N* — > N* to be their common extension. 
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Lemma 4. For all h G N, starts with h and ends with h + 1. Furthermore, if w G 

{0, . . . , h}*, then there are as many occurrences of h + 1 in ip(w) as there are occurrences of 
h in w, and each one is preceded by a 0. 

Proof. We proceed by induction on h with a vacuous base case. For every letter hi < h 
that appears in w, the corresponding factor of ip(w) is <p(h') = <Ph'{h') G {0, . . . , hi + 1}* C 
{0, . . . , h}*, so <p(h') does not contain an occurrence of h + 1. 
We also have 

<p h (h) = (S- 1 o p h _ x o ■ • ■ o ^ (00)) • (h + 1) G {0, . . . , h}*{h + 1), 

so for each occurrence of h in w, the corresponding factor (p(h) = (ph{h) in <p(w) contains 
exactly one occurrence of h + 1. By induction, the last letter of (fh-i ° • • • ° y?o(00) is h, and 
it is preceded by 0, so iph(h) starts with h and ends with • (h + 1). Since all occurrences of 
h + 1 in y?(u>) occur in this way, this completes the proof. □ 

Theorem 5. For all h G N, (ph is irreducible and irreducible after the first position with 
respect to overlaps. Thus, (p = lim^^oo (p^ has these properties. 

Proof. We proceed by induction on h with a vacuous base case. 

First, let us show that for each hi < h, the word <ph{h') is irreducible after the first 
position. For h' < h, the string hi is irreducible after the first position, so by induction, 
<Ph{h') = <Ph'{h') is irreducible after the first position. For h' = h, the word y = <Ph-i ° • • • ° 
ipo(00) is irreducible, a square, and ends with the letter h by Lemma HI Thus, the word 

hy = (S^ 1 o <p h _ x o ■ • • o v? (00)) ■ h 

is irreducible after the first position and is an overlap, so the word 

<Ph(k) = {S- 1 O <p h _ x O • • ■ O V9 (00)) -{h+1) 

is irreducible after the first position. 

Now let it; G {0, . . . , h}* be a word. Then <ph(w) can be broken up into blocks corre- 
sponding to the images under ip^ of the individual letters in w. By the remarks above, each 
position in (fh{w) that is not the first position of a block is irreducible. 

By Lemma HI we can recover w from <ph{w) = <p(w) by taking the letters in the first 
position of each block. Suppose we replace the letter d at one of these positions p by a letter 
c < d. If this creates an overlap in w ending at p, then position p is preceded by a square 
xx in w that begins with c. This gives a square (ph(x)(ph(x) in iph{w) that begins with c, so 
replacing d by c in iph(vj) creates an overlap ending at that position. Thus, if w is irreducible 
at a position, then <Ph(w) is irreducible at the first position of the corresponding block. If w 
is irreducible or irreducible after the first position, then <ph{w) has the same property, so ipt 
is both irreducible and irreducible after the first position. □ 

Theorem 6. For all h G N, tpn is an overlap-free morphism. Thus, tp = limh^^iph is an 
overlap-free morphism. 
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Proof. We proceed by induction on h with a vacuous base case. 

First, let us show that for each h! < h, the word fh{h') is overlap-free. For h' < h, the 
string h! is overlap-free, so by induction, iph(h') = tph'(h') is overlap-free. For h' = h, the 
word 00 is overlap-free, so y = (fh-i ° • • • ° y?o(00) is overlap-free. By Lemma HI y contains 
exactly two occurrences of the letter h, and no occurrences of the letter h + 1. Thus, 

hy = (S^ 1 o <p h _ x o • • • o v? (00)) ■ h 

is itself an overlap, and it does not contain any other overlap, so 

<Ph(h) = O ip h -l O • • ■ O V9 (00)) • (h + 1) 

is overlap-free. 

Now let w G {0, . . . , h}* be a word, and break up <fh(w) into blocks corresponding to the 
images under (p^ of the individual letters in w. Suppose x is an overlap of length 2n + 1 in 
<Ph(w), so that x[i] — x[i +n] for all 1 < i < n + 1. We want to show that w contains an 
overlap. 

Since x is not contained in a single block of (fh{w) by the remarks above, let x[k] be the 
start of the block B containing x[2n + 1], so that 1 < k < 2n + 1. Then x[k — 1] is the end 
of a block, so it is not 0. If k < n + 1, then x[k + n] — x[k] is contained in the block B, and 
it follows from Lemma H] that x[k — 1] = x[k + n — 1] =0, a contradiction. Thus, we actually 
have n + l<k<2n + l. 

Since the block B starting at x[k] contains x[2n + 1], we have x[k] > x[k], . . . ,x[2n]. 
Consider the block A that contains x[k — n], and say A starts at x[j] and ends at x[£]. Since 
x[k — n — 1] = x[k — 1] 7^ 0, the block A does not end at x[k — n], so x[j] > x[k — n]. 
Since x[k — n] > x[k — n], . . . ,x[n], the block A does not end at any of these positions, so 
n + 1 < £ < k. Since x[£ — n] — x[£] > x\j], the block A starts after this position, so 
1 < £ — n < j < k — n < n + 1. 

Since x[j — 1] is the end of a block, it is not 0. Thus, x[j + n — 1] 7^ 0, so x[j + n] is not the 
end of a block. Since x[j] > x[j], . . . , x[£ — 1], we have x[j + n] > x[j + n], . . . , x[2n], so the 
block containing x[j +n] does not end at any of these positions. Thus, the block containing 
x[j + n] also contains x[2n + 1], so j + n > k. Since we have j < k — n from above, we have 
j = k — n. 

The picture so far is that x[k] is the start of the block B containing x[2n + 1], and that 
x[k — n] is the start of the block A that ends at x[£], where n + 1 < £ < k < 2n+ 1. Let sxt 
be the factor of tph{w) formed by taking the blocks containing x. Then we have 

sxt = s ■ x[l . . . k — n — 1] • x[k — n . . .£]■ x[£ + 1 ... k — 1] • x[k . . . 2n + 1] • t 
= s ■ x[l . . . k — n — 1] • (fh(x[k]) ■ x[£ + 1 ... k — 1] • iph(x[k]) 
= s ■ x[l . . .£ - n] ■ ip h (z) ■ <p h {x[k)) ■ <£h{z) ■ ^h(x[k]) 
= (f h (x[k)) ■ (f h {z) ■ ip h (x[k)) ■ <p h (z) ■ <ph{%[k)) 
= iph(x[k] ■ z ■ x[k] ■ z ■ x[k]), 
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where 2 is a (possibly empty) string, and the next-to-last equality holds because the string 
s ■ x[l . . . £ — n] is non-empty and ends with x[£ — n] = x[£] = x[k] + 1. By Lemma HJ tph(h') 
starts with h' and ends with h' + 1 for every letter h' < h, so the last letter of a block (or 
the first one) completely determines which letter it comes from, and it follows that (fh is 
injective. Thus, w must actually contain the string x[k] ■ z ■ x[k] ■ z ■ x[k], which is an overlap. 
This shows that tp^ is overlap-free. □ 

Remark 7. Note that our proof of Theorem [6] only depends on three facts about <p(h). For 
all h G N, 

1. p{h) is overlap-free; 

2. <p(h) E h{0,...,h}*(h + l); 

3. every occurrence of h or h + 1 in <p(h) after the first letter is preceded by 0. 

Thus, we know that various other morphisms from N* — > N*, such as the morphism defined 
by h > h ■ ■ (h + 1), are also overlap- free. 

Corollary 8. The word ^"(0) is the lexicographically least infinite overlap-free word overN. 

Proof. By Theorems [5] and [6], the infinite word y w (0) is overlap-free and irreducible, so by the 
remarks of Section HI it is the lexicographically least infinite overlap-free word over N. □ 

This shows our main result, that w 2 + = ^(0), but we also get the following interesting 
corollary, which is the starting point for our exploration of the structure of f in the next 
section. 

Corollary 9. For all < h < k , let ip(h, k) be the lexicographically least overlap-free word 
over N that starts with h and ends with k. Then ip(h, k) = (p k ~ h (h). 

Proof. Let w be any overlap-free word starting with h and ending with k. The word h 
is overlap-free and irreducible after the first position, so by Theorems [5] and [6] the word 
ip k ~ h {h) is overlap-free and irreducible after the first position. Also, by Lemma HI it starts 
with h and contains a single occurrence of k, at the end. Since w contains k, it cannot 
be a proper prefix of (p h ~ h (h). Since ip k ~ h (h) is irreducible after the first position and w is 
overlap- free and starting with the same letter, we have (p h ~ h (h) < w lexicographically. Thus, 
ip(h,k) = if k - h (h). □ 

7 More about the overlap-free words ip(h, k) 

For < h < i < k, the word if>(h, k) has ip{h, £) as a prefix and ip(£, k) as a suffix, since 

and ip k ~ (h) starts with h, and ip e ~ h (h) ends with I. 

However, we can be much more precise than this about the structure of if)(h, k). Letting 
a(h, k) = \ip{h, k)\, we have the following result: 
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Theorem 10. Let < h < k. Then 



'k-\ 



${h, k) = JJ ^-"^-^(^(O, k - l)) 2 • k. 



yt=h 



Proof. The result is immediate for h = k, since then ip(k, k) = k. For h < k we have 

= tp k - h - 1 (s-X<p h (ao))-(h + i)) 

= S^ k ~ h ~ 1 ^(i P k -\00)) ■ ip k ~ h ~\h + 1) 
= ^-1^(^,^-1)1(^(0, k - l)) 2 • + 1, fe), 

and the result follows by induction on k — h. 

Corollary 11. We have the recurrence 

a(h, k) = 2(k - h)a(0, k - 1) + 1 

for < h < k and k > 1, with initial condition a(0, 0) = 1. Furthermore, 



□ 



£=0 



Proof. Direct calculation. 

Table [1] below gives the first few values of a(h, k). 



□ 



h\k 





1 


2 


3 


4 


5 





1 


3 


13 


79 


633 


6331 


1 




1 


7 


53 


475 


5065 


2 






1 


27 


317 


3799 


3 








1 


159 


2533 


4 










1 


1267 


5 












1 



Table 1: Some values of a(h, k) = \ip(h, k)\. 

Note that the sequence (a(0, k)), = (1,3,13,79,633,...) is Sloane's sequence A010844 
and has exponential generating function exp(a;)/(l — 2x). Another sequence of interest is 
(\(p(h)\) h = (3,7,27,159,1267,...), which given by \cp(h)\ =a(h,h + l) = 2a(0,/i) + l. 

Note also that given the structure from Theorem [TD] and the fact that the function a(0, k) 
grows quite fast and can be computed easily, it is possible to compute w 2 + [n] in time bounded 
by a polynomial in logn, using the following algorithm: 
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function eval(n): 

// Find the first value of a(0, k) which is at least n. 
k := 0; 
a[0] := 1; 

while (a[k) < n) do 
k := k+ 1; 

a[k) := 2ka[k - 1] + 1; 
// Compute i[>(0, k)[n]. This quantity is the loop invariant. 
while (k > 0) do 

if (n = a[k]) then 

// The last letter ofip(0,k) is k. 
return(fc); 

else 

// The letter falls in a block of the form S^^'V^O, k - l)) 2 . 
if (k - 1 > 0) then 

£■= L(n-l)/2a[Jfe-l]J; 

shift := 2ia\k - 2]; 

else 

shift := 0; 

// Maintain the loop invariant while reducing k. 
n := ((n + shift — 1) mod a[k — 1]) + 1; 
k:=k-l- 

The next thing to consider is the frequency of each letter in W2+ and the words ip(h, k). 
For fixed points of morphisms over a finite alphabet generated by iteration, it is well-known 
that the frequency of a letter, if it is exists, must be an algebraic number [21 Thm. 8.4.5]. 
Now w 2 + is the fixed point of a morphism, but over an infinite alphabet. As we will see the 
frequency of each letter is transcendental. 

The following corollary gives the distribution of letters for ip(0, k), from which the dis- 
tribution of letters for i/j(h, k) can easily be computed. 

Corollary 12. Let d(h,k) be the number of times h occurs in ip(0,k). Then we have the 
recurrence 

d(h, k) = 2kd(h,k- 1) 

for < h < k, with initial conditions d(k, k) = 1 for k > 0. Hence d(h, k) = 2 k ~ h k\/h\ for 
< h < k. 

Proof. The recurrence follows directly from Theorem [TD], and the rest is a direct calculation. 

□ 

Table [2] below gives the first few values of d(h, k). 
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h\k 





1 


2 


3 


4 


5 





1 


2 


8 


48 


384 


3840 


1 




1 


4 


24 


192 


1920 


2 






1 


6 


48 


480 


3 








1 


8 


80 


4 










1 


10 


5 












1 



Table 2: Some values of d(h, k), giving the letter frequencies in ?/>(0, k). 

Theorem 13. For all fceN, the limiting frequency of the letter k in w 2 + exists and is equal 
to l/2 k k\y/e. 

Proof. First we establish the relative frequencies of the letters. For all letters e N and 
lengths n > 1, let f{k, n) be the number of occurrences of the letter k in the prefix of w 2 + of 
length n. From Theorem [T0| we know that rotations of ip(0, k) of the form S~ a ^ h ' k \ip(0, k)) 
with < h < k can be decomposed as concatenations of the letter k and of rotations of 
-0(0, k — 1) of the form S~ a ( h ' fe_1 )(^(0, k — 1)) with < h! < k — 1, in some order. By 
induction, it follows that for all k! > k, ip(0, k') can be decomposed as a concatenation of 
letters greater than k and of rotations of ?/>(0, k) in some order. 

Thus, each prefix of w 2 + consists of letters greater than k, a certain number of rotations of 
ip(0, k), and possibly a prefix of a rotation of ip(0, k). Since each rotation of ^(0, k) contains 
a single occurrence of the letter k and exactly 2k occurrences of the letter k — 1, we have 
lim^oo f{k — l,n)/ f(k,n) = 2k and 

lim '<*■»> 1 



f(0,n) 2 k kV 

Next we show that lim^oo n/ f(0, n) = sfe. Letting F(£,n) = Ylt=o f(k' n )i we have 



lim 



F(£,n) 

7(M ^2**r 



- i 

OkU ■ 



Since lim^^oo F(£, n) = n pointwise, it is tempting to conclude that 

n v 1 

7(0^) = 2^! = v/i ' 

but for that we need some kind of uniform convergence, which we establish below. 

From the remarks above on the decomposition of ^(0, k'), it follows that for all k, n, 
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Also, since the letter k first appears at position a(0, k), we have the more convenient bound 

f(k,n)<— — < 



a{0,k) ~ 2 k kV 

which is stronger for small values of n. Since Y^=o^/^ k ^- ls a convergent series, we get 

fc=€+l 

uniformly in n. 

Since the convergence is uniform in n and the relative letter frequencies converge, writing 

n F(£,n) n 

for large enough £ shows that the quantity n/f(0,n) can be bounded independently of n. 
Then, we have 

,. F(£,n) n F(£,n) n 

hm — = hm 



^oo/(0, n) ^oo/(0,n) n /(0,n) 
uniformly in n, so we get 

lim — = lim lim — - — r- = \fe 

n-KX>/(0, 7l) £->oon->-oo /(0, 7l) 

as desired, and the result follows. □ 

Finally, we can also derive some symmetry properties of ip(h, k). A surprising number of 
them follow from the next result. 

Theorem 14. For keN, we have S-\i/>(0, k)) = R(i/>(0, k)). 

Proof. We proceed by induction on k with a vacuous base case. For < h < k — 1, we have 
that a(0, k — 1) — a(/i, fc — 1) = a(/c — 1 — h, k — 1) — 1, so 

fe-i 

S^^O, fc)) = fc • Y[ S-^-VtyiO, k - l)) 2 

h=0 
k-1 



= k J[ S^-^-^iS- 1 ^, k - l))) 2 

h=0 

k-1 

= k Y[ S^-^-^iR^iO, k - l))) 2 

h=0 

k-1 

= k Y[ R(S- a ( k - 1 - h > k - 1 \i/>(0, k - l))) 2 



h=0 

'k-1 



k-Rl Jl S , - O(h '> fc - 1) (V'(0,A;-1)) 5 



>Jl'=0 



R(ilj(0,k)). □ 
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Corollary 15. 

(a) For allO<h<k, R(ip(k - h, k)) ■ ip(h, k) = k ■ V>(0, k); 

(b) For all k > 0, the word ?/>(0, k) ■ k~ l is a palindrome; 

(c) For all heN, <p(h) = S'^ipiO, h)f ■ (h + 1) = R(^(0, h)f ■ (h + 1); 

(d) For all /igN, the word h~ x ■ (p(h) ■ (h + is a palindrome. 

(e) For all /igN, the word tp o Ro <p(h) is a palindrome. 
Proof. 

(a) By Theorem [Ml we have 

k ■ V>(0, k) = S _1 (V>(0, k)) ■ k = R(ip(0, k)) ■ k. 

We know that ip(h, k) is a suffix of ip(0, k) of length a(h, k) and that R(ip(k — h, k)) is 
a prefix of R(tp(0, k)) of length a(k — h, k). By Corollary [TT], a(k — h,k) + a(h, k) = 
1 + a(0, k), so this prefix and this suffix actually form all of k ■ ip(0, k) together. 

(b) From part (a), we know that k ■ -0(0, k) = R(ip(0, k)) ■ k = R{k ■ ip(0, k)) is a palindrome, 
and removing the first and last letter gives the palindrome ip(0, k) ■ fc -1 . 

(c) From Corollary [9] and Theorem [HJ we get 

y^) = S-V(0)) 2 •(/> + !) 
= S-\mh)) 2 -(h + l) 
= R(iP(0,h)) 2 -(h + 1) 

(d) From part (c), we have 

(p(h) = R(iJ)(p, h)f -(h + l) = h- R(ip(0, h) ■ h" 1 ) 2 ■ (h + 1), 
and from part (b), ^(0, h) ■ h~ x is a palindrome. 

(e) We have 

ifoRo ip(h) = ipo R(R(ip h (00)) ■ (h + 1)) 

= ^ + i) V(oo)) 

= i?(/ +1 (00))-(/i + 2)-/ +1 (00), 
which is a palindrome. □ 
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Given a morphism £, we can define its reversed morphism £r by reversing the image of 
each letter, so that £r = Ro^oR. Some rare morphisms, such as the Thue- Morse morphism 
//: {0, l} 2 — > {0, l} 2 defined by /i(0) = 01 and = 10, have the property that they 
commute with their reversed morphism. The morphism ip also has this property. 

Corollary 16. The morphisms (p and (p R commute. 

Proof. It is enough to check that <p o p R {h) = p R o <p(h) for all h G N. By Corollary [THl we 
have 

ip o ipn(h) = ip o R o tp o R(h) = ip o R o </?(/i) = Ro p o Ro ip(h) = p R o p(h). □ 

We have already established the link between p and tp(h,k), but there is also a link 
between and ip(h, k). 

Theorem 17. For all < h < k and i > 0, we have 

ifi(h,k + i) = <p i (i(>(h, k)) 
ip(h + i,k + i) = (p R (ip(h, k) ■ k' 1 ) -(k + i). 

Proof. The first equality follows directly from Corollary [9j For the second equality, note 
that Corollary [TBI gives 

R(ip(k - h, k)) ■ ip(h, k) = R(*p(0, k)) ■ k, 

so that 

tfj(h, k) ■ k' 1 = R(ip(0, k) ■ ip(k - h, ky l ). 
Applying tp R = R o p l o R to both sides gives 

p R (ip(h, k) ■ k~ l ) = Rfa'tyiO, k) ■ ip(k - h, k)- 1 )) 

= R(ip(0, k + i) -7p(k- h,k + iy 1 ) 

= tfj(h + i,k + i) -(k + i)' 1 . □ 

8 Further questions 

For the sake of simplicity, we have presented the proofs in this paper for squares and overlaps, 
but they can be extended easily enough to the case of arbitrary nth powers and (n + )-powers 
for integer n, which is very similar. However, the case of fractional powers seems harder. 
(For a definition of fractional powers, see, for example [21 p. 23].) In fact, it is not even 
clear that an infinite alphabet is needed. For example, the first million letters of W5/2, the 
lexicographically least infinite word over N avoiding all powers with exponent > 5/2, are all 
in {0,1, 2}. 

Several other patterns P, especially when considered over N, have the property that any 
finite P-free word can be extended to a longer word, in which case the no-backtracking algo- 
rithm will work. In such cases, the lexicographically least infinite P-free word is irreducible. 
One can ask, when is this word generated by a P-free irreducible morphism? 
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